Wind Turbine Diagnosis

Introduction:

As the whole world looks for alternate sources to satisfy the increasing needs of power, wind energy seems to be the next viable, cleaner, cheaper source of Non-Conventional source of energy. Wind energy, although a clean fuel source unlike power plants, comes tagged along with its own sets of challenges to be addressed. After being designated as a Data Analyst into the wind farm owned by XYZ Inc., with a lot of help from the field engineer deputed along with me, I had the opportunity to understand first hand, the functionalities, operations and challenges faced in a Wind Farm. The sensors installed at various parts of the whole unit provides several parameters. But to relate the parameters and to associate only those characteristics which primarily contribute in altering the power was important. The deputed field engineer was kind enough in helping us understand these technical specifications and building these important relationships. The below theoretical Power Equation was provided to us from the Field Engineer.

P = 0.5 x ρ x A x Cp x V3 x Ng x Nb

Where,

ρ = Air density in kg/m3,

A = Rotor swept area (m2).

Cp = Coefficient of performance

V = wind velocity (m/s)

Ng = generator efficiency

Nb = gear box bearing efficiency

Data Loading

In [143]:
# loading packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import datetime
import os
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import mean_squared_error
from sklearn import model_selection
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
In [144]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
%matplotlib inline
import openpyxl
import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
In [145]:
os.getcwd()
Out[145]:
'/Users/roshan.biswal/Desktop/Under writing'
In [146]:
cd Under writing
[Errno 2] No such file or directory: 'Under writing'
/Users/roshan.biswal/Desktop/Under writing
In [147]:
#loading data set 
train_data=pd.read_excel('Data- Wind turbine diagnosis.xlsx')
In [148]:
train_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30683 entries, 0 to 30682
Data columns (total 22 columns):
DateTime                  30683 non-null datetime64[ns]
Active.Power              30683 non-null float64
Wind.Speed                30683 non-null float64
Tower.Acceleration.RMS    30683 non-null float64
Gen.Speed                 30683 non-null float64
Hub.Speed                 30683 non-null float64
Ambient.Temp              30683 non-null float64
Gearbox.Oil.Temp          30683 non-null float64
Hydraulic.Oil.Temp        30683 non-null float64
Nacelle.Temp              30683 non-null float64
PCU.Heatsink.Temp         30683 non-null float64
PCU.Blade1.Motor.Temp     30683 non-null float64
PCU.Blade2.Motor.Temp     30683 non-null float64
PCU.Blade3.Motor.Temp     30683 non-null float64
Timer.Yaw                 30683 non-null float64
Direction_x               30683 non-null float64
Yaw.Error                 30683 non-null float64
Air.Density               30683 non-null float64
Blade.Rotor.Angle         30683 non-null float64
Blade1.Pitch.Angle        30683 non-null float64
Blade2.Pitch.Angle        30683 non-null float64
Blade3.Pitch.Angle        30683 non-null float64
dtypes: datetime64[ns](1), float64(21)
memory usage: 5.2 MB
In [149]:
turbine_no="Wind Turbine Diagnosis" #for powercurve graph

Filtering data

In [150]:
##Checking for null values in columns
train_data.isnull().sum().sum()
Out[150]:
0

No data with null values found in data set

In [151]:
# creating a backup of the data set
working_data_set=train_data
In [152]:
working_data_set.head()
Out[152]:
DateTime Active.Power Wind.Speed Tower.Acceleration.RMS Gen.Speed Hub.Speed Ambient.Temp Gearbox.Oil.Temp Hydraulic.Oil.Temp Nacelle.Temp ... PCU.Blade2.Motor.Temp PCU.Blade3.Motor.Temp Timer.Yaw Direction_x Yaw.Error Air.Density Blade.Rotor.Angle Blade1.Pitch.Angle Blade2.Pitch.Angle Blade3.Pitch.Angle
0 2015-05-07 12:10:00 0.000 10.304991 0.029475 0.000000 0.000000 28.100000 33.074312 30.020161 27.798381 ... 69.054955 62.360009 3202.919722 3.698 84.324204 1.134000 120.400002 90.019997 90.000000 90.000000
1 2015-05-07 13:10:00 0.000 12.114834 0.039450 2.561702 0.008035 27.790823 32.318149 29.200001 28.711891 ... 67.185126 61.592758 3202.920226 -0.185 85.358294 1.130000 120.400002 90.023830 90.000000 90.000000
2 2015-05-07 13:30:00 0.000 11.625033 0.049417 4.439519 2.413139 27.613112 32.154716 30.962003 28.204204 ... 68.469464 66.045056 3202.924190 9.967 64.572482 1.130771 122.179998 90.027696 90.000000 90.000000
3 2015-05-07 13:40:00 0.000 10.197867 0.040475 13.661278 3.918215 27.479778 32.073000 34.325259 28.500908 ... 68.846883 67.977519 3202.926217 2.653 -1.415134 1.134517 187.350099 86.697537 86.682875 86.685850
4 2015-05-07 13:50:00 1.645 10.796599 0.020867 172.699256 5.816421 27.346445 32.650275 33.705851 29.147533 ... 62.804022 69.092141 3202.928243 -5.745 -6.674765 1.135730 181.369652 64.796145 64.799131 64.800898

5 rows × 22 columns

In [153]:
# checking if there is any negative Power genrated 
working_data_set["Active.Power"][(working_data_set["Active.Power"]<0)]
Out[153]:
22892      -0.058837
30246   -9999.000000
Name: Active.Power, dtype: float64
In [154]:
#Number of power -ve  
len(working_data_set["Active.Power"][(working_data_set["Active.Power"]<0)])
##Remove the data that power is -ve
#We do that because according to turbine power curve turbine works between these values.
working_data_set=working_data_set[(working_data_set["Active.Power"]>0)]

Detect anomalies across the features and address the potential causes

Based on the available data set, we have found two data points wherein the power generated were negative.

  1. Out of these two, one record seemed to be a junk record, which may have showed up due to possible sensor reboot. This assumption is based on the fact that, 10 minutes before and after the negative spike, the data points seem perfectly normal. Also, it can be noted that all the sensors show an exact value of -9999, which seems like a maximum range to which the system resets itself upon restart.

  2. Another such anomaly was with a negative power of that was found was the negative wind power that was generated for a time period. Entire record for that particular hour of day has peculiar data which deviates from the standard expectations.

a. Potential causes for negative power might be due to the power flow from load (Power Grid) to source (Wind Turbine) instead of source to load.

b. This phenomenon is often denoted with a negative sign convention. This sign convention is necessary for generating utility bills i.e. to understand who pays whom.

Perform exploratory data analysis of all the features

In [155]:
# checking the corelation between attributes
z=working_data_set.corr()
z
Out[155]:
Active.Power Wind.Speed Tower.Acceleration.RMS Gen.Speed Hub.Speed Ambient.Temp Gearbox.Oil.Temp Hydraulic.Oil.Temp Nacelle.Temp PCU.Heatsink.Temp ... PCU.Blade2.Motor.Temp PCU.Blade3.Motor.Temp Timer.Yaw Direction_x Yaw.Error Air.Density Blade.Rotor.Angle Blade1.Pitch.Angle Blade2.Pitch.Angle Blade3.Pitch.Angle
Active.Power 1.000000 0.948565 0.715468 0.906887 0.824934 -0.285430 0.660833 -0.121922 -0.026793 -0.323411 ... -0.330900 0.025536 0.310139 0.211250 0.002939 0.198990 0.006036 0.065618 0.065246 0.065243
Wind.Speed 0.948565 1.000000 0.705770 0.871339 0.808340 -0.234166 0.658967 -0.094560 0.002911 -0.283258 ... -0.290337 0.026871 0.270734 0.196715 0.024980 0.145397 0.001356 0.120780 0.120249 0.120244
Tower.Acceleration.RMS 0.715468 0.705770 1.000000 0.656446 0.643572 -0.195831 0.490888 -0.017930 0.008196 -0.280565 ... -0.248198 -0.005964 0.193925 0.234152 -0.003674 0.127008 -0.001797 0.062647 0.062757 0.062709
Gen.Speed 0.906887 0.871339 0.656446 1.000000 0.889309 -0.251615 0.645305 -0.096041 0.008935 -0.277684 ... -0.304352 0.023296 0.256192 0.217964 -0.004810 0.176568 0.007961 -0.280946 -0.282764 -0.282763
Hub.Speed 0.824934 0.808340 0.643572 0.889309 1.000000 -0.197525 0.638336 -0.039072 0.059631 -0.245312 ... -0.256672 0.012220 0.189496 0.208747 -0.008088 0.125294 0.003579 -0.208290 -0.208650 -0.208660
Ambient.Temp -0.285430 -0.234166 -0.195831 -0.251615 -0.197525 1.000000 0.281312 0.876420 0.940830 0.647997 ... 0.797781 -0.019081 -0.613206 -0.091028 0.015931 -0.977687 -0.002590 0.011781 0.013539 0.013499
Gearbox.Oil.Temp 0.660833 0.658967 0.490888 0.645305 0.638336 0.281312 1.000000 0.398923 0.533309 0.234360 ... 0.240928 0.028721 -0.078123 0.124092 0.006036 -0.340322 0.003020 0.021547 0.021812 0.021756
Hydraulic.Oil.Temp -0.121922 -0.094560 -0.017930 -0.096041 -0.039072 0.876420 0.398923 1.000000 0.915801 0.548006 ... 0.697348 0.011985 -0.513166 -0.052472 -0.000755 -0.865314 -0.004610 -0.006022 -0.004860 -0.004919
Nacelle.Temp -0.026793 0.002911 0.008196 0.008935 0.059631 0.940830 0.533309 0.915801 1.000000 0.581827 ... 0.738157 -0.005889 -0.510694 -0.037222 0.009203 -0.943250 -0.001686 -0.014747 -0.013341 -0.013380
PCU.Heatsink.Temp -0.323411 -0.283258 -0.280565 -0.277684 -0.245312 0.647997 0.234360 0.548006 0.581827 1.000000 ... 0.830598 0.063583 -0.410327 -0.110669 0.011326 -0.613360 -0.004887 -0.059788 -0.058806 -0.058873
PCU.Blade1.Motor.Temp 0.017538 0.017699 -0.005398 0.016701 0.006860 0.014038 0.046863 0.045362 0.025740 0.106277 ... 0.070478 0.399923 0.086006 -0.015278 -0.011786 -0.014127 -0.010015 -0.034587 -0.034436 -0.034472
PCU.Blade2.Motor.Temp -0.330900 -0.290337 -0.248198 -0.304352 -0.256672 0.797781 0.240928 0.697348 0.738157 0.830598 ... 1.000000 0.040145 -0.416538 -0.115539 0.006853 -0.765772 -0.007760 -0.026304 -0.023829 -0.023874
PCU.Blade3.Motor.Temp 0.025536 0.026871 -0.005964 0.023296 0.012220 -0.019081 0.028721 0.011985 -0.005889 0.063583 ... 0.040145 1.000000 0.112893 -0.006529 -0.008579 0.018086 -0.012046 -0.037697 -0.037799 -0.037808
Timer.Yaw 0.310139 0.270734 0.193925 0.256192 0.189496 -0.613206 -0.078123 -0.513166 -0.510694 -0.410327 ... -0.416538 0.112893 1.000000 0.028861 -0.011814 0.579296 -0.006357 0.036723 0.034984 0.035004
Direction_x 0.211250 0.196715 0.234152 0.217964 0.208747 -0.091028 0.124092 -0.052472 -0.037222 -0.110669 ... -0.115539 -0.006529 0.028861 1.000000 0.594390 0.071259 -0.005541 -0.046319 -0.046156 -0.046238
Yaw.Error 0.002939 0.024980 -0.003674 -0.004810 -0.008088 0.015931 0.006036 -0.000755 0.009203 0.011326 ... 0.006853 -0.008579 -0.011814 0.594390 1.000000 -0.014419 -0.003973 0.023342 0.023518 0.023528
Air.Density 0.198990 0.145397 0.127008 0.176568 0.125294 -0.977687 -0.340322 -0.865314 -0.943250 -0.613360 ... -0.765772 0.018086 0.579296 0.071259 -0.014419 1.000000 0.001130 -0.030111 -0.031826 -0.031779
Blade.Rotor.Angle 0.006036 0.001356 -0.001797 0.007961 0.003579 -0.002590 0.003020 -0.004610 -0.001686 -0.004887 ... -0.007760 -0.012046 -0.006357 -0.005541 -0.003973 0.001130 1.000000 -0.007512 -0.008618 -0.007570
Blade1.Pitch.Angle 0.065618 0.120780 0.062647 -0.280946 -0.208290 0.011781 0.021547 -0.006022 -0.014747 -0.059788 ... -0.026304 -0.037697 0.036723 -0.046319 0.023342 -0.030111 -0.007512 1.000000 0.999661 0.999661
Blade2.Pitch.Angle 0.065246 0.120249 0.062757 -0.282764 -0.208650 0.013539 0.021812 -0.004860 -0.013341 -0.058806 ... -0.023829 -0.037799 0.034984 -0.046156 0.023518 -0.031826 -0.008618 0.999661 1.000000 0.999990
Blade3.Pitch.Angle 0.065243 0.120244 0.062709 -0.282763 -0.208660 0.013499 0.021756 -0.004919 -0.013380 -0.058873 ... -0.023874 -0.037808 0.035004 -0.046238 0.023528 -0.031779 -0.007570 0.999661 0.999990 1.000000

21 rows × 21 columns

In [156]:
f,ax=plt.subplots(figsize=(15, 15))
sns.heatmap(data=working_data_set.corr(),annot=True,linewidths=.5,fmt= '.1f',ax=ax)
plt.show()

To improve the productivity of the plant, the aim is to improve the parameters - Cp, Ng and Nb through predictive data analytics. The above Power Equation did not always hold good. Reason being, we had variable Air Density and Wind Velocity. These parameters altered the efficiency and performance of the Wind Turbine. Upon further diagnosis, it could be understood that changing Air Density and Wind Velocity is a result of several factors and its interdependencies. The above graph , plots these interdependencies

Feature engineering

In [157]:
#add a column named month. we will fix values later.
working_data_set["Month"]=train_data["Active.Power"]
working_data_set["hour"]=train_data["Active.Power"] # place holder 
working_data_set["formula"]=(train_data["Wind.Speed"]*train_data["Air.Density"]*train_data["Wind.Speed"]*train_data["Wind.Speed"] )
/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [158]:
#rename 
working_data_set.rename(columns={'DateTime':'Time'},inplace=True)
In [159]:
#rename 
working_data_set.rename(columns={'DateTime':'Time'},inplace=True)
In [160]:
#function for finding months
def find_month(a):
    a=str(a)    
    datee = datetime.datetime.strptime(a,"%Y-%m-%d %H:%M:%S")
    x=datee.month
    if 1 == x:
        return "Jan"
    elif 2 == x:
        return "Feb"
    elif 3 == x:
        return "March"    
    elif 4 == x:
        return "April"    
    elif 5 == x:
        return "May"    
    elif 6 == x:
        return "June"    
    elif 7 == x:
        return "July"    
    elif 8 == x:
        return "August"    
    elif 9 == x:
        return "Sep"    
    elif 10 == x:
        return "Oct"    
    elif 11 == x:
        return "Nov"    
    else:
        return "Dec"    
In [161]:
working_data_set.Month=working_data_set.Time.apply(find_month)
In [162]:
#function for finding months
def find_hour(a):
    a=str(a)    
    datee = datetime.datetime.strptime(a,"%Y-%m-%d %H:%M:%S")
    x=datee.hour
    return x
In [163]:
working_data_set.hour=working_data_set.Time.apply(find_hour)
In [164]:
#to check the time series data present in data set
working_data_set.hour.unique()
Out[164]:
array([13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,  0,  1,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  2])
In [165]:
def find_day(a):
    a=str(a)    
    datee = datetime.datetime.strptime(a,"%Y-%m-%d %H:%M:%S")
    x=datee.hour
    return x
In [166]:
working_data_set.day=working_data_set.Time.apply(find_day)
/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:1: UserWarning:

Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access

In [167]:
# check create a mean speed 

def mean_speed(x):
    list=[]
    i=0.25
    while i<=25.5:
        list.append(i)
        i+=0.5
        
    for i in list:
        if x < i:
            x=i-0.25
            return x
In [168]:
#add a new column as "mean_WindSpeed" with function mean_speed().
working_data_set["mean_WindSpeed"]=working_data_set["Wind.Speed"].apply(mean_speed)
/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [169]:
#create summary speed dataframe from clean data.
DepGroupT_speed = working_data_set.groupby("mean_WindSpeed")
data_T_speed=DepGroupT_speed.mean()
In [170]:
#change the index numbers.
data_T_speed["Index"]=list(range(1,len(data_T_speed.index)+1))
data_T_speed.set_index("Index",inplace=True)
del data_T_speed.index.name
In [ ]:
 

Theoretical Power Equation vs. Practical Power Equation:

In [171]:
# Power curve with plotly

trace1 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Active.Power"],
                    mode = "lines+markers",
                    name = "ActivePower(kW)",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = " Power vs. Wind Speed".format(turbine_no),
              xaxis= dict(title= 'Wind Speed (m/s)',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Power',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)
In [172]:
#AIR DENSITY VS POWER 
# sorting the data for temp
data_T_power=data_T_speed.sort_values('Active.Power')





trace1 = go.Scatter(
                    x = data_T_power["Active.Power"],
                    y = data_T_power["Air.Density"],
                    mode = "lines+markers",
                    name = "AIR DENSITY",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = "AIR DENSITY VS POWER  fig 3".format(turbine_no),
              xaxis= dict(title= 'Power',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Air Density',ticklen= 5,zeroline= False,range = [1.15,1.25]),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

It could be seen from the provided equation that Generated Power is proportional to Wind Velocity cubed and it is directly proportional to the Air Density. The below graphs ([Fig 2], [Fig 3]) were plotted to verify the same. It may be seen that the relationship between Power and Wind Speed is exponential in nature and that of Power with Air Density is partially linear. Although, it is interesting to note that at Rated Power, even with the increase in Air Density and Wind Power, there seems to be no significant change in the Power Output. We see that there are several outliers at 2500kW, in both, [Fig 2], [Fig 3]. The generator coupled with the Wind Turbine seems to have reached its threshold of maximum power output and hence it is conclusive that the rated power of the unit is approximately 2500kW. The maximum possible efficiency of any generator can be achieved only at this power. Thus, our intention would be to help the unit in generating the rated power to get the maximum efficiency

In [194]:
trace1 = go.Scatter(
                    x = data_T_speed["formula"],
                    y = data_T_speed["Active.Power"],
                    mode = "lines+markers",
                    name = "ActivePower(kW)",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = "Wind Speed (m/s)^3*airdensity vs Power Curve fig 4".format(turbine_no),
              xaxis= dict(title= 'Wind Speed (m/s)^3*airdensity',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Power ',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

Below [Fig 4] is the relationship between Power and ρ x V3, ρ being the Air Density and V being the Wind Velocity

Challenges in Hand:

XYZ Inc., the manufacturer, deployed us to help them find answers to the below questions which would help them improve their efficiency and in turn help them maximize their Power Output. Based on the analysis done on the data received from the sensors, below are the answers to the questions. Please note, the answers may not be in the exact order in which they have been asked.

Perform exploratory data analysis of all the features (use descriptive statistics and visualization with their interpretations)

[Fig 1] shows an exhaustive relationship between all the parameters. The relationships are rated between -1.0 and +1.0.

• A relationship of +1.0 would mean that there is a positive direct proportionality between the two parameters. • A relationship of +1.0 would mean that there is an inverse proportionality between the two parameters.

Key Interpretations:

• Wind Speed, Tower Acceleration RMS are directly proportional and is highly related to the generated Active Power (by a factor of 0.9 as compared to the other parameters).

• Increasing ambient temperature has a negative impact on the generated Power Output (by a factor of -0.6 as compared to the other parameters). With increasing temperature, the generated power tends to decrease.

• The Generator Speed can be seen to be directly proportional to the Active Power.

o An increase in the generator speed increases the gearbox temperature.

o The gearbox temperature also increases with increase in Ambient temperature.

o An increase in the temperature of gearbox leads to an increase in temperature of the Heat Sink.

o With an increase in the Heat Sink temperature, the power generation gets negatively affected. This in turn affects the generated power efficiency due to the thermal losses incurred during power generation.

In [174]:
trace1 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Ambient.Temp"],
                    mode = "lines+markers",
                    name = "Ambient Temp",
                    line = dict(width = 3))
trace2 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Gearbox.Oil.Temp"],
                    mode = "lines+markers",
                    name = "Gearbox Oil Temp",
                    line = dict(width = 3))
trace3 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Hydraulic.Oil.Temp"],
                    mode = "lines+markers",
                    name = "Hydraulic Oil Temp",
                    line = dict(width = 3))
trace4 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Nacelle.Temp"],
                    mode = "lines+markers",
                    name = "Nacelle Temp",
                    line = dict(width = 3))
trace5 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["PCU.Heatsink.Temp"],
                    mode = "lines+markers",
                    name = "PCU Heatsink Temp",
                    line = dict(width = 3))



data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = "{} temp vs power Curve".format(turbine_no),
              xaxis= dict(title= 'Power',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Ambient Temp',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

• Direction_x has a high relationship with the generated power (by a factor of 0.8 as compared to the other parameters) and with the generator speed (by a factor of 0.9 as compared to the other parameters). Through the data set, it seems that Direction_x is the inclination at which the wind flows. Going by this logic, it seems fair that it has a relationship with Wind Speed, Timer Yaw and Yaw Error by a factor of 0.7, 0.5 and 0.6 respectively as compared to the other parameters.

• Also, we have already concluded from [Fig 3] and from the theoretical equation, that the Air Density is directly proportional to the Generated Power Output.

Perform cross sectional analysis of active power and wind speed, wind speed and pitch data, wind speed and temperature data.

Active Power and Wind Speed:

In [175]:
trace1 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Active.Power"],
                    mode = "lines+markers",
                    name = "ActivePower(kW)",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = "{} Power vs Wind Speed Curve Fig 7 ".format(turbine_no),
              xaxis= dict(title= 'Wind Speed (m/s)',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Power',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

Rated Speed, Cut-In Speed and Cut-Out Speed:

Every wind turbine, based on its location and its design specifications, has its own dedicated Rated Speed, cut-in speed and cut-out speed.

  1. The rated speed is the point at which the turbine generates the maximum power output.

  2. Cut-in Speed can be understood as the wind speed at which the blades start running and the Active Power generation begins.

  3. On the Contrary, Cut-out speed refers to the speed at which the turbine shuts down to avoid damage. At cut-out speeds the turbine reaches dangerously high voltages and hence needs to be stopped in order to protect from the voltage surges and damage to the turbine.

  4. These speed parameters are important to derive the efficiency of the wind turbine. Above graph [Fig 7] determines these parameters of the wind turbine.

  5. The rated speed of the turbine is around 13-15 m/s.

  6. The cut-in speed of the turbine is approximately 4-5 m/s.

  7. The cut-out speed of the turbine is close to 25 m/s

Wind Speed and Pitch Data

Below [Fig 8] is the relationship between Wind Speed and Pitch Data

In [176]:
#Wind Speed VS Pitch Data
trace1 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Blade1.Pitch.Angle"],
                    mode = "lines+markers",
                    name = "Blade1.Pitch.Angle",
                    line = dict(width = 3))
trace2 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Blade2.Pitch.Angle"],
                    mode = "lines+markers",
                    name = "Blade2.Pitch.Angle",
                    line = dict(width = 3))
trace3 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Blade3.Pitch.Angle"],
                    mode = "lines+markers",
                    name = "Blade3.Pitch.Angle",
                    line = dict(width = 3))



data = [trace1,trace2,trace3]
layout = dict(title = "{} Wind Speed VS Pitch Curve Fig 8".format(turbine_no),
              xaxis= dict(title= 'Wind Speed (m/s)',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Blade Pitch Angle',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

Wind Speed and Temperature Data:

Below [Fig 9] is the relationship between Wind Speed and Temperature

In [177]:
#Wind.Speed vs Ambient.Temp



trace1 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Ambient.Temp"],
                    mode = "lines+markers",
                    name = "ActivePower(kW)",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = "{} Wind Speed vs Ambient Temp fig 9".format(turbine_no),
              xaxis= dict(title= 'Wind Speed ',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Ambient Temp',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

Explore the major features that contribute to fluctuations in power output and temperatures

What are optimal conditions for these features to generate maximum power output from the turbine

Based on the power equation, following are the major factors that contribute to fluctuations in wind power output theoretically:

  1. Area of rotor:

• From the power equation it can be inferred that power is directly proportional to the rotor swept area (area swept by the blades of the wind turbine).

• So, larger the turbine blades, greater is the power output.

• This parameter being a design specification, remains constant throughout.

  1. Wind Velocity:

• Wind speed depends on height of the turbine from the ground.

• Wind turbines are thus mounted at an optimum height form the ground level in order to obtain the maximum power output.

• Wind velocity depends on the ambient temperature as seen in the [Fig 9].

• Wind velocity also depends on air density as seen in the [Fig 10].

In [178]:
# Wind Speed (m/s) vs Air Density
trace1 = go.Scatter(
                    x = data_T_speed["Wind.Speed"],
                    y = data_T_speed["Air.Density"],
                    mode = "lines+markers",
                    name = "ActivePower(kW)",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = "{} Wind Speed (m/s) vs Air Density Curve Fig 10 ".format(turbine_no),
              xaxis= dict(title= 'Wind Speed (m/s)',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Air Density',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

3.Air Density:

• Air density depends on the temperature and weather conditions.

• Varying temperatures / weather conditions affect the air density which in turn affects the wind speed.

• Fluctuating temperatures and wind speed leads to fluctuating power outputs.

• Air density is maximum at sea level. As Wind power is directly proportional to air density, increase in air density increases power output.

• At higher altitude, air density decreases significantly, so wind farms cannot be made in the mountains.

• Thus, making the turbine taller and taller will not give more power.

• The optimum values of air density can be achieved at sea level.

In [179]:
# sorting the data for temp
data_T_temp=data_T_speed.sort_values('Ambient.Temp')

# Air Density vs Ambient Temp
trace1 = go.Scatter(
                    x = data_T_temp["Ambient.Temp"],
                    y = data_T_temp["Air.Density"],
                    mode = "lines+markers",
                    name = "ActivePower(kW)",
                    line = dict(width = 3))
data = [trace1]
layout = dict(title = " {} Air Density vs Ambient Temp".format(turbine_no),
              xaxis= dict(title= 'Ambient Temp ',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Air Density',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)
  1. Temperature:

• From [Fig 12] (below) it can be observed that the wind turbine operates regularly within the temperature range from -20 degrees to 30 degrees.

• However, when the temperature is beyond this range, it influences the output Power.

• If the temperature is too high, the air density will be low and in turn leads to a drop in the Output Power.

• If the temperature is too low, the blades and other parts might be frozen, and the wind turbine will stop working.

o This anomaly can be addressed by adopting effective operation & maintenance techniques available.

o Sensors can be used to detect the build-up of ice on the rotor.

o A control unit can be designed to process the signals received from the sensors and activate a possible ice removal mechanism.

o This approach is found to be followed in the aeronautical industry to remove the icing from the propellers which affects the functioning of the airplanes.

o Another approach that is being followed is application of anti-adhesive coating such as Teflon coating on the blades.

In [180]:
#Temperature  VS  Gearbox Oil Temp

trace1 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Ambient.Temp"],
                    mode = "lines+markers",
                    name = "Ambient Temp",
                    line = dict(width = 3))
trace2 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Gearbox.Oil.Temp"],
                    mode = "lines+markers",
                    name = "Gearbox Oil Temp",
                    line = dict(width = 3))
trace3 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Hydraulic.Oil.Temp"],
                    mode = "lines+markers",
                    name = "Hydraulic Oil Temp",
                    line = dict(width = 3))
trace4 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["Nacelle.Temp"],
                    mode = "lines+markers",
                    name = "Nacelle Temp",
                    line = dict(width = 3))
trace5 = go.Scatter(
                    x = data_T_speed["Active.Power"],
                    y = data_T_speed["PCU.Heatsink.Temp"],
                    mode = "lines+markers",
                    name = "PCU Heatsink Temp",
                    line = dict(width = 3))



data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = "{} temp vs power Curve".format(turbine_no),
              xaxis= dict(title= 'Power',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Ambient Temp',ticklen= 5,zeroline= False),
              legend=dict(x=0.5, y=0.5))
fig = dict(data = data, layout = layout)
iplot(fig)

Prepare descriptive statistics based daily, weekly and monthly rollouts

Daily Variations in Key Parameters:

Intraday Power Variations:

In [181]:
# case 9
x1=working_data_set['hour']
y1=working_data_set['Active.Power']
z1=working_data_set['Wind.Speed']
a1=working_data_set['Ambient.Temp']

• It can be observed in [Fig 12] that most of the outliers for generated power seem to occur between 6 AM and 8 AM.

• It has already been established that there is a direct correlation between Power and Wind Speed, both theoretically and statistically. From [Fig 13] it may be inferred that between 6 AM and 9 AM we see the maximum number of outliers. • In the early morning hours, there seems to be a significant dip in the temperature.

Intraday Wind Variations:

In [182]:
trace1 = go.Box(
    y=z1,
    x=x1,
    name='radishes',
    marker=dict(
        color='#FF4136'
    ),boxmean='sd'
)
data = [ trace1]
layout = go.Layout(
    yaxis=dict(
        title='wind speed distrubution for every hr of day ',
        zeroline=False
    ),
    boxmode='group'
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

Intraday Temperature Distribution:

In [183]:
trace1 = go.Box(
    y=a1,
    x=x1,
    name='radishes',
    marker=dict(
        color='#FF4136'
    ),boxmean='sd'
)
data = [ trace1]
layout = go.Layout(
    yaxis=dict(
        title='temp distrubution for every hr of day ',
        zeroline=False
    ),
    boxmode='group'
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
In [184]:
import plotly.plotly as py
import plotly.graph_objs as go

y_data=y1
x_data=x1

trace0 = go.Box(
    y=y_data,
    x=x_data,
    name='kale',
    marker=dict(
        color='#3D9970'
    ),
    boxmean='sd'
)

data = [trace0]
layout = go.Layout(
    yaxis=dict(
        title='Power gernarated every hr of a day fig 10',
        zeroline=False
    ),
    boxmode='group'
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

Intraday Prediction:

It can be inferred from the above graphs that the wind speed is maximum during daytime i.e. 6 AM to 9 AM where the temperature is low and air density is high. If the wind speed increases the wind power generated also increases.

Monthly Variations in Key Parameters:

Below graphs depicts a clear picture of how wind speed, air density and temperature affect the wind power output. Higher the temperature, lower the air density which implies low wind speed and low power output. This trend can be observed during the months May to Sep

Do you notice the turbine to be more productive during any part of the day?(eg: middle of the night/middle of the day etc)

Turbine seems to be more productive during early hours of the day i.e., 5 to 9 AM from [Fig 10], where the temperatures are cooler compared to the rest of the day. It also implies that air density is more, hence wind speed increases and thus a higher generated power output. I

Is there any time dependency of turbine performance or any seasonal effect ?

In [185]:
x1=working_data_set['Month']
y1=working_data_set['Active.Power']
z1=working_data_set['Wind.Speed']
a1=working_data_set['Ambient.Temp']
b1=working_data_set['Air.Density']
In [186]:
data = [
  go.Histogram(
    histfunc = "avg",
    y = z1,
    x = x1,
    name = "avg"
  )]

layout = go.Layout(
    title='Avg wind speed over months fig 15',
    xaxis=dict(
        title='Month'
    ),
    yaxis=dict(
        title='Avg wind speed'
    ),
    bargap=0.2,
    bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')
In [187]:
data = [
  go.Histogram(
    histfunc = "avg",
    y = a1,
    x = x1,
    name = "avg"
  )]

layout = go.Layout(
    title='Avg Temp over months fig 16',
    xaxis=dict(
        title='Month'
    ),
    yaxis=dict(
        title='Avg Temp'
    ),
    bargap=0.2,
    bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')
In [188]:
data = [
  go.Histogram(
    histfunc = "avg",
    y = b1,
    x = x1,
    name = "avg"
  )]

layout = go.Layout(
    title='Air Density over months fig 17',
    xaxis=dict(
        title='Month'
    ),
    yaxis=dict(
        title='Air Density',range=[1.1,1.25]
    ),
    bargap=0.2,
    bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')
In [189]:
import plotly.plotly as py
import plotly.graph_objs as go


data = [
  go.Histogram(
    histfunc = "sum",
    y = y1,
    x = x1,
    name = "sum"
  )]

layout = go.Layout(
    title='Total power generated over months fig 18',
    xaxis=dict(
        title='Month'
    ),
    yaxis=dict(
        title='Total power generated over the month'
    ),
    bargap=0.2,
    bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')

From the monthly statistics graphs ([Fig 15] to [Fig 18]) it can be inferred that the wind power is maximum during winters (October to December), where also the wind speeds are higher. It can also be observed that the temperature is minimum and air density is maximum compared to rest of the year.

Impute the missing prices using appropriate reasoning and methods.

The missing parameters are ‘Area swept by the rotor’ and the ‘efficiency’ of the generator.

Need a graph between below:

P = K Air Density VVV K = 0.0000665(CpA) A is a constant which will never change.

In [190]:
working_data_set["missing_value"]=working_data_set['Active.Power']/(train_data["Wind.Speed"]*train_data["Air.Density"]*train_data["Wind.Speed"]*train_data["Wind.Speed"] )
/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [191]:
z1 = working_data_set["missing_value"]/max(working_data_set["missing_value"])
data = [
  go.Histogram(
    histfunc = "avg",
    y = z1,
    x = x1,
    name = "avg"
  )]

layout = go.Layout(
    title='Efficiency Over a Year',
    xaxis=dict(
        title='Month'
    ),
    yaxis=dict(
        title='Efficiency'
    ),
    bargap=0.2,
    bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')

The missing parameters are ‘Area swept by the rotor’ and the ‘efficiency’ of the generator. Since the Area is a constant throughout, we can say that the Efficiency is a factor of [P/(ρ x V3)]. The below graph shows the change in this factor over 8 months.

Conclusion:

Based on the above analysis, it can be understood that below variable parameters have a major role to play in determining the generated Power Output from a Wind Turbine.

  1. Wind Speed

  2. Air Density

  3. Temperature

The below inputs will significantly help in improving the efficiency of the Power System as a whole.

  1. Wind Speed is a natural phenomenon which cannot be controlled, but we do get an understanding, from the above analysis, on how the wind speed varies.

  2. With drop in temperature, we see an increase in the air density and with a drop in air density we see an increase in the wind speeds.

a. An increase in the wind speed leads to an increase in the generated power.

  1. During winters, significant drops in the temperatures (negative temperatures) may lead to drop in the generated power, even though there might be high wind speeds.

a. This is because frost may get accumulated on the wind blades, leading to the wind turbines to come to a standstill.

b. To prevent this, a control unit can be designed to process the signals received from the sensors and activate a possible ice removal mechanism.

c. Another approach that can be followed is application of anti-adhesive coating such as Teflon coating on the blades.

  1. At higher temperatures, during summers, the air density being a natural phenomenon, can’t be controlled.

a. The efficiency of the system can be improved by employing methods to cool the Heat Sink Temperature, Gear Box Oil Temperature, Blade Motor Temperatures etc.

b. A control unit can be designed which will be able to detect spikes in the above mentioned temperature and takes necessary actions for bringing it down to an optimum value.

In [ ]:
 
In [ ]: